refactor(BA-5650): unify session ownership to owner_id and main_access_key [1/2]#11050
Closed
jopemachine wants to merge 61 commits into
Closed
refactor(BA-5650): unify session ownership to owner_id and main_access_key [1/2]#11050jopemachine wants to merge 61 commits into
jopemachine wants to merge 61 commits into
Conversation
3 tasks
This was referenced Apr 14, 2026
Two independent additions that other parts of the BA-5650 stack build on: - ``UserRepository.get_main_access_key_by_id`` (+ the matching ``UserDBSource`` method) returns ``main_access_key`` for a given user UUID with a single readonly query. Used by downstream refactors that need to resolve the keypair access_key on demand from the owner. - ``SessionConditions.by_access_key_*`` filters in ``models/session/conditions.py`` now resolve through a subquery on ``UserRow.main_access_key`` keyed by ``SessionRow.user_uuid``, instead of the ``sessions.access_key`` column. A new ``_owners_where_main_access_key`` staticmethod centralises the subquery shape. Behavior is unchanged on ``main`` where ``sessions.access_key`` still mirrors the owner's main_access_key; this change just lets us drop the redundant column later.
Slice A scope is only the new get_main_access_key_by_id helper and the SessionConditions rewrite. Accidentally dropping the target_main_access_key argument from delegate_endpoint_ownership belongs to a later slice that also updates EndpointRow and the user service; restoring the signature here so this PR builds standalone.
ba76760 to
4303676
Compare
0b74482 to
6180d2d
Compare
4303676 to
557b59a
Compare
6180d2d to
8bb85b0
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes the owner_access_key parameter from REST v1 session endpoints and shifts delegation to owner_id: UUID (only for session creation endpoints), while making read/control endpoints always operate as the authenticated caller.
Changes:
- Remove
owner_access_keyfrom REST v1 session request DTOs and update handlers to useowner_idfor delegated session creation only. - Drop
AuthProcessors/resolve_access_key_scopedependency from the v1 session handler and update route wiring accordingly. - Update REST v2 session handler + session adapter signatures to stop threading
access_keyfor shutdown/logs/update, and adjust tests + add breaking-change note.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/common/dto/manager/session/test_request.py | Updates request DTO tests for removed fields (but now contains some no-op assertions). |
| tests/component/session/test_session_query.py | Removes AuthProcessors wiring for session handler in component test server setup. |
| tests/component/session/test_session_create_delegation.py | Switches delegation input from owner_access_key to owner_id in the component test. |
| tests/component/session/conftest.py | Removes AuthProcessors wiring for session handler in component test fixtures. |
| src/ai/backend/manager/api/rest/v2/session/handler.py | Removes user_ctx/access_key threading for shutdown/logs/update calls into adapter. |
| src/ai/backend/manager/api/rest/tree.py | Updates REST v1 session handler construction after dropping auth dependency. |
| src/ai/backend/manager/api/rest/session/handler.py | Core v1 session handler updates: remove access-key scope resolution; use owner_id for delegation on create endpoints; enforce caller-only behavior elsewhere. |
| src/ai/backend/manager/api/adapters/session.py | Removes adapter access_key plumbing for shutdown/logs/update; updates node mapping fields to owner_id/main_access_key. |
| src/ai/backend/common/dto/manager/session/request.py | Removes owner_access_key fields from v1 request DTOs; adds owner_id to creation DTOs. |
| changes/11050.breaking.md | Documents the breaking API change for clients. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Drop the redundant UserRow.main_access_key IS NOT NULL guard inside SessionConditions._owners_where_main_access_key. The subquery selects users.uuid (non-null PK), so NOT IN is well-defined; NULL main_access_key fails the caller-supplied condition on its own. The behaviour is unchanged — this just makes the helper contract clearer. Also drop the stale misc news fragment (skip:changelog applies to this purely additive refactor).
Rename UserPermission.user_uuid to owner_id and add main_access_key: str | None. KernelRow.to_kernel_info populates the new field from the eagerly-loaded user_row relationship; search_kernels now selectinload's user_row so the scalar is available without N+1. KernelRow.delegate_ownership drops its access_key argument (resolved from the owner). recalc_concurrency_used switches from KernelRow.access_key to a KernelRow.user_uuid subquery keyed by the owner's main_access_key as a shim for the eventual access_key column drop. Test fixtures that construct UserPermission are updated.
Earlier the branch pulled test fixtures that also referenced later slices' renames (SessionMetadata.owner_id, ScheduledSessionData .main_access_key, SessionDataForPull/Start.main_access_key, SessionData.owner_id). Revert those fixtures to main state and re-apply only the UserPermission field renames (user_uuid -> owner_id, access_key -> main_access_key) which are in this slice's scope. Also update the two remaining live readers of UserPermission: - sokovan/scheduler/fair_share/aggregator.py - api/adapters/session.py (kernel_info_to_node) And fix the SessionRow.delegate_ownership caller of KernelRow.delegate_ownership to match the new single-argument signature.
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G). - ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's ``access_key`` field is now an empty string until the read-time resolver lands. - ``cache_invalidation`` uses ``info.access_key`` (the field name on SessionTransitionInfo). - ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction`` no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter. - ``ModelServingRepository.get_session_by_id`` drops the now-removed positional ``owner_access_key`` arg from ``SessionRow.get_session``. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
f3342b7 to
cd02a14
Compare
``SessionRepository`` and the underlying ``SessionDBSource`` now take ``owner_id: UUID`` on every method that previously accepted ``owner_access_key: AccessKey``. Affects: - ``get_session_validated`` - ``match_sessions`` - ``update_session_name`` - ``find_dependency_sessions`` / ``_find_dependent_sessions`` - ``get_target_session_ids`` - ``get_session_with_group`` The matching ``dependency_graph`` helpers and ``creators`` are updated in lockstep. Service-layer callers still pass ``owner_access_key`` temporarily; they will be migrated in the next slice.
Rename access_key -> main_access_key on sokovan data types
(SessionAllocation, PreparedSessionData, SessionDataForPull,
SessionDataForStart, SessionWorkload) and update every sokovan caller
accordingly. Affects:
- sokovan/data/{allocation,lifecycle,workload}.py
- sokovan/scheduler/handlers/lifecycle/*
- sokovan/scheduler/handlers/maintenance/sweep_sessions.py
- sokovan/scheduler/provisioner/{provisioner,sequencers,validators}/*
- sokovan/scheduler/launcher/launcher.py
- sokovan/scheduler/post_processors/cache_invalidation.py
- sokovan/scheduler/fair_share/aggregator.py
- sokovan/scheduling_controller/{preparers,scheduling_controller}.py
- sokovan/deployment/{executor,route}.py
No external behavior change.
- Drop ``owner_id`` from 21 read/control session action dataclasses;
keep it only on ``create_from_template`` / ``create_from_params`` /
``create_cluster`` (the three delegation-capable creation paths).
- ``services/session/service.py``: add ``_requester_user_id()`` that
pulls the caller's UUID from the ``current_user()`` context var,
replacing 21 ``owner_id = action.owner_id`` sites.
- ``services/session/lifecycle.py``: inject ``UserRepository`` via the
constructor instead of instantiating it internally.
- ``registry.py``: accept ``UserRepository`` and forward it into
``SessionLifecycleManager``.
- ``dependencies/agents/{registry,composer}.py``: wire the repository
through the dependency graph.
- ``repositories/user/{repository,db_source/db_source}.py``: add
``get_main_access_key_by_id`` (renamed from ``_by_uuid``).
- Minor adapter/service touch-ups in ``services/export`` and
``repositories/model_serving``.
Clean up the ``changes/BA-5650-{D,E,F}.misc.md`` files that were superseded when each slice's news fragment was renamed to the assigned PR number (e.g. ``changes/11046.enhance.md``). These stragglers made it onto downstream branches during the cascade rebases.
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G). - ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's ``access_key`` field is now an empty string until the read-time resolver lands. - ``cache_invalidation`` uses ``info.access_key`` (the field name on SessionTransitionInfo). - ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction`` no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter. - ``ModelServingRepository.get_session_by_id`` drops the now-removed positional ``owner_access_key`` arg from ``SessionRow.get_session``. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Scheduler / predicates / scheduler-type collapse of the owner key: - ``scheduler/predicates.py``: predicates now take SessionRow and resolve ``main_access_key`` from the owner via a helper when a keypair-scoped lookup (Redis concurrency, keypair resource policy) is required. Renames ``SessionRow.user_uuid`` references throughout. - ``scheduler/drf.py``: per-user fairness tracking keyed by ``owner_id``/``main_access_key`` pair. - ``repositories/scheduler/options.py``: drop the duplicated ``by_access_key_*`` factories — session filters go through ``SessionConditions`` helpers instead. - ``repositories/scheduler/types/*``: rename ``access_key`` to ``main_access_key`` on ``ScheduledSessionData``, ``TerminatingSessionData``, ``SweptSessionInfo``, ``KernelEnqueueData`` and ``SessionEnqueueData``. - ``repositories/events/db_source/db_source.py`` and ``repositories/stream/db_source/db_source.py``: resolve the owner UUID from ``main_access_key`` via a sub-select shim while the schema still exposes ``sessions.access_key``.
REST v1 session endpoints no longer accept owner_access_key. The delegation field is replaced with owner_id (user UUID) and is honored only on the three session-creation endpoints (create_from_template / create_from_params / create_cluster). Read and control endpoints always act as the authenticated caller. - common/dto/manager/session/request.py: drop the owner_access_key field from Create/Destroy/Restart/GetContainerLogs/GetStatusHistory request DTOs; add owner_id to the three creation DTOs. - api/rest/session/handler.py: remove all 26 resolve_access_key_scope calls, drop the AuthProcessors dependency, and renumber log format placeholders that dropped the owner argument. - api/rest/v2/session/handler.py: drop the redundant user_ctx / access_key arguments from shutdown_service / get_logs / update. - api/adapters/session.py: update adapter call sites accordingly. - api/rest/tree.py: drop the auth= argument from SessionHandler(). Test updates for the corresponding DTO assertions and component fixtures are included.
- api/rest/session/handler.py: fix log format mismatches. The
GET_OR_CREATE and CREAT_CLUSTER logs now include both the requester
access_key and the resolved owner_id. The remaining
SYNC_AGENT_REGISTRY / TRANSIT_STATUS / COMMIT_SESSION /
CONVERT_SESSION_TO_IMAGE / GET_COMMIT_STATUS / GET_ABUSING_REPORT /
GET_STATUS_HISTORY logs drop the redundant second ``/{}`` slot so
the format string matches the provided arguments.
- tests/unit/common/dto/manager/session/test_request.py: replace the
``assert req is not None`` placeholders with asserts that verify
``owner_access_key`` no longer exists and the empty model_dump()
shape; gives us a regression guard for the removed field.
- tests/component/session/test_session_create_delegation.py: rename
the test / class docstring from ``owner_access_key`` to ``owner_id``
so they describe the new delegation shape.
Removed TestRestartSessionRequest and TestGetStatusHistoryRequest which only checked that owner_access_key was no longer present. Trivial attribute checks add no value over the schema definition itself. Also clean up unused imports for the removed test classes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
After collapsing service signatures and removing ``owner_access_key`` from action DTOs, propagate the rename to remaining call sites: - Drop ``owner_access_key`` kwargs from session action constructors in service tests (MatchSessionsAction, GetStatusHistoryAction, DestroySessionAction, CompleteAction, GetSessionInfoAction, DownloadFilesAction, GetDirectAccessInfoAction, RenameSessionAction, GetContainerLogsAction, ListFilesAction, InterruptSessionAction). - Rename ``user_uuid``/``access_key`` → ``owner_id`` (drop access_key) on ``SessionData`` constructions in tests. - ``SessionTransitionInfo`` keeps ``access_key``; cache_invalidation reverted to ``info.access_key``. - Terminator conftest now uses ``main_access_key`` for ``TerminatingSessionData``. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- TerminatingSessionData uses ``main_access_key`` (was access_key) in test_terminate_sessions. - SessionData fixture in test_session_adapter uses ``owner_id`` and drops the dropped ``access_key`` field. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The cascading rebase pulled in stray files from a developer working copy: tmp-vllm-model/, server.py, vllm.sh, model-definition.yaml, docs/BA-5608-* and docs/model-deployment-zero-downtime.*, and the scripts/ collection of one-off E2E scripts. None of these belong on the release branch. Removing them also unblocks ``pants tailor`` in CI, which had begun suggesting BUILD additions for the orphan source tree. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
f448f27 to
7077091
Compare
…abels Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Member
Author
|
Squashed into PR #11051 — stacked PRs cannot pass CI independently with cross-cutting type changes. |
Cherry-pick test updates from slice I so that this branch passes typecheck independently. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #10914 (BA-5650)
Summary
main_access_keyresolver helpers and thread throughUserPermissionSessionData.user_uuid→owner_id, collapse repository signatures toowner_id: UUIDowner_idrename through scheduler, sokovan handlers, and coordinatorowner_idviacurrent_user()in service layerowner_access_keyfrom REST v1 session APITest plan
pants lintpassespants checkpasses (mypy: no issues found)Stack
🤖 Generated with Claude Code